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ABSTRACT 

Separating the cosmological redshifted 21 -cm signal from foregrounds is a major 
challenge. We present the cross-correlation of the redshifted 21 -cm emission from 
neutral hydrogen (HI) in the post-reionization era with the Ly-a forest as a new 
probe of the large scale matter distribution in the redshift range z = 2 to 3 without 
the problem of foreground contamination. Though the 21 -cm and the Ly-a forest 
signals originate from different astrophysical systems, they are both expected to trace 
the underlying dark matter distribution on large scales. The multi-frequency angular 
cross-correlation power spectrum estimator is found to be unaffected by the discrete 
quasar sampling, which only affects the noise in the estimate. 

We consider a hypothetical redshifted 2 1 -cm observation in a single field of view 
1.3° (FWHM) centered at z = 2.2 where the binned 21 -cm angular power spectrum 
can be measured at an SNR of 3d or better across the range 500 < I < 4000. Keep- 
ing the parameters of the 21 -cm observation fixed, we have estimated the SNR for the 
cross-correlation signal varying the quasar angular number density n of the Ly-a for- 
est survey. Assuming that the spectra have SNR ~ 5 in pixels of length 44 km/s, we 
find that a 5cr detection of the cross-correlation signal is possible at 600 < £ < 2000 
with n = 4 deg~ 2 . This value of n is well within the reach of upcoming Ly-a forest 
surveys. The cross-correlation signal will be a new, independent probe of the astro- 
physics of the diffuse IGM, the growth of structure and the expansion history of the 
Universe. 
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1 INTRODUCTION. 

Observations of the redshifted 21 -cm radiation from neutral hydrogen (HI) provides an unique op- 
portunity for probing the cosmological matter distribution over a wide range of redshifts (0 < z < 
200) and there currently is considerable effort underway towards detecting this (Furlanetto et al., 
2006; Lewis & Challinor, 2007; Morales & Wyithe, 2009). Foregrounds from other astronomical 
sources which are several orders of magnitude larger, however, pose a severe challenge for detect- 
ing this signal (Santos et al., 2005; McQuinn et al., 2006; Ali et al., 2008). The 21-cm emission 
from the post-reionization era (z < 6) is of particular interest (Saini et al., 2001; Bharadwaj & 
Sethi, 2001; Bharadwaj et al., 2001; Wyithe & Loeb, 2007) because the foregrounds are relatively 
smaller and the HI is expected to trace the underlying dark matter with a possible bias. These ob- 
servations hold the possibility of measuring both the matter power spectrum and the cosmological 
parameters (Wyithe et al, 2007; Chang et al, 2008; Bharadwaj et al., 2009). 

Interestingly, diffused HI in the intervening intergalactic medium in the same z range, produces 
a large number of absorption lines (Lyman-a forest) in the spectra of distant quasars (QSO). These 
low neutral density absorption lines are caused due to small baryonic fluctuations in the IGM and 
has the potential to probe the matter distribution and baryonic structure formation to very small 
scales. Here the the Ly-a forest, whose fluctuations is believed to trace the underlying dark matter, 
is of special interest. The Ly-a forest is known to be a valuable cosmological probe (Mandelbaum 
et al., 2003). This has found a variety of applications which include determining the matter power 
spectrum (Croft et al., 1998, 1999; Lesgourgues et al., 2007), cosmological parameter estimation 
(Seljak et al., 2006; McDonald & Eisenstein, 2007; Gratton et al., 2008), constraining the cluster- 
ing properties of dark matter on small scales (Viel et al., 2008) and probing the reionization history 
(Hui & Gnedin, 1997; Gallerani et al., 2006; Cen et al., 2009). 

Though the 21-cm emission and the Ly-a forest both originate from HI at the same z, these 
two signals originate from two different kinds of astrophysical systems. The Ly-a forest originates 
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from small HI fluctuations present in the primarily ionized IGM; the 21-cm emission from these 
regions is completely negligible. On the other hand, the bulk of the 21-cm signal originate from 
Damped Ly-a Absorbers (DLAs) which contain most of the neutral hydrogen at these epochs 
(Lanzetta et al., 1995; Storrie-Lombardi et al., 1996; P'eroux et al., 2003). It is however reasonable 
to assume that on large scales both these traces the same underlying dark matter, and hence we 
may expect them to be correlated. 

In this paper, we propose a novel probe of the large scale matter distribution using the cross- 
correlation of the 21-cm brightness temperature and the Ly-a forest transmitted flux. The cross- 
correlation signal holds the potential of independently unveiling the same astrophysical and cos- 
mological information as the individual auto-correlations, with the added advantage that the prob- 
lems of foregrounds and systematics are expected to be much less severe for the cross -correlation. 
We note earlier studies that consider the possibility of cross -correlating the Ly-a forest with the 
CMBR (Croft et al., 2006) and weak lensing Vallinotto et al. (2009), and cross -correlating the 
post-reionization 21-cm signal with the CMBR (Guha Sarkar et al., 2009) and weak lensing 
(Guha Sarkar, 2010). The cosmological 21-cm signal has recently been detected through cross- 
correlations with the 6dfGRS (Pen et al., 2009) and the DEEP2 optical galaxy redshift survey 
(Chang et al, 2010). 



2 THE CROSS-CORRELATION ANGULAR POWER SPECTRUM. 

The fluctuations in the transmitted flux ^(n, z) along a line of sight n in the Ly-a forest may 
be quantified using 5jr(n, z) = T{h.^z)jJ : — 1. At the large scales of interest here it is rea- 
sonable to adopt the fluctuating Gunn-Peterson approximation (Gunn & Peterson, 1965; Bi & 
Davidsen, 1997; Croft et al., 1998, 1999) which relates the flux and the matter density contrast 
as T = exp[— A(l + 5) K ] where A and k are two redshift dependent functions. The function A 
is of order unity and depends on the mean flux level, IGM temperature, photo-ionization rate and 
cosmological parameters, while k depends on the IGM temperature density relation (McDonald 
et al., 2001; Choudhury et al., 2001). For a preliminary analytic estimate of the cross-correlation 
signal, we assume that 5jr has been smoothed whereby it is adequate to retain only the linear term 
8 T oc 5 (Croft et al., 1998; Bi & Davidsen, 1997; Viel et al., 2002; Slosar et al., 2009) The higher 
order terms, which have been dropped to keep the analytic calculations tractable, will, in principle, 
contribute to the cross -correlation. We plan to address this in future studies using simulations. 
In the redshift range of our interest (z < 3.5) the fluctuation in the redshifted 21-cm brightness 
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temperature &r(n, z) traces the underlying dark matter distribution with a possible scale dependent 
bias function br(k, z). The bias is expected to be scale dependent below the Jeans length-scale 
(Fang et al., 1993), and fluctuations in the ionizing background (Wyithe & Loeb, 2007, 2009) also 
give rise to a scale dependent bias. Further, this bias is found to grow monotonically with z for 
1 < z < 4 (Marin et al., 2009). However, the simulations of (Bagla et al., 2009), and also (Wyithe 
& Loeb, 2009) indicate that a constant, scale independent bias is adequate at the large scales of 
our interest ( t < 6000 at z ~ 2.2). We have used the constant value b T = 2 in our analysis. 

With these assumptions and incorporating redshift space distortions we may express both 5jr 
and 8 T as 

5 a (h, z) = C a j e ik --[l + (3 af i 2 ] A(k) . (1) 

where a = J 7 , T refers to the Ly-a flux and 21 -cm brightness temperature respectively, r is the 
comoving distance, A(k) is the dark matter density contrast in Fourier space and /i — k • n. We 
adopt Cjr = —0.13 and (5jr = 1.58 from numerical simulations of the Ly-a forest (McDonald, 
2003). 

For the 21 -cm we use C T = f x m b and p T = f/b (Bharadwaj & Ali, 2004, 2005), where 

(¥)(¥) (fe). 

xm is the mean neutral hydrogen fraction, / is the linear growth parameter of density fluctuations 
and b T is the bias. At redshifts < z < 3.5 we have tt gas ~ 10 -3 (Lanzetta et al., 1995; Storrie- 
Lombardi et al., 1996) which implies that x m = 50 Vt gas h 2 (0.02 /Vt h h 2 ) = 2.45 x 10 -2 used here. 
As mentioned earlier, numerical simulations (Khandai et al., 2009) suggest that 6^2 which we 
adopt here. 

Consider next a field of view that is sufficiently small such that it may be treated as being 
flat. We may then express the unit vector along the line of sight as n = m + 9, where m is the 
line of sight to the centre of the field of view and 9 is a two-dimensional (2D) vector on the sky 

— # — # 

(9 « 1). In this flat sky approximation it is convenient to decompose 5^(9, z) and 5 T (9, z) into 

— * 

Fourier modes where we use U as the variable conjugate to 9. Following Datta et al. (2007), we 
define the multi-frequency angular power spectrum (MAPS) as 

P a (U, Az) = — 2 f dk\\ cos(A:|| Ar) F a (fi) P{k) . (3) 
nr Jo 

Here a = T refers to the HI 21 -cm brightness temperature fluctuation power spectrum of St(9, z) 
and 5t(9, z + Az) at two slightly different redshifts z and z + Az. Similarly, a = T and a = c 
respectively refer to the Ly-ct forest and 5t — Sjr cross -correlation power spectra. In eq. (3), Ar 
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is the radial comoving separation corresponding to Az, P(k) is the dark matter power spectrum, 
k = yjk\ + (^f-) 2 and /i = k\\/k. The function F a (/i) takes values A 2 (ji) , B 2 (/i) and A(/i) B(/i) 
corresponding to a = T, T and c respectively. Here A(pi) = C T [l + Pt^ 2 ] and B(p:) = Cjr[l + 
fijrfj 2 ]. We note that the MAPS P (U, Az), which is directly related to observable quantities, 
contains the entire information of the three dimensional (3D) power spectrum through its U and 
Az dependence. 

Given a field of view, it will be possible to probe 5jr only along a few, discrete lines of sight 
corresponding to the angular positions of the bright quasars. We incorporate this through a sam- 
pling function which is a sum of Dirac delta functions p{9) = N^ 1 J2 n 8 2 D (9 — 9 n ) where 9 n refers 
to the angular positions of the quasars and the summation extends up to N, the number of quasars 
in the field of view. Taking into account the discrete sampling, the observed Ly-a forest flux fluc- 
tuation may be written as 5jr (9) = p(9) 5^(9). The aim here being to detect the cross-correlation 
power spectrum, we define the estimator 

£(U, Az) = \ fe (U, z) ^(U, z + Az) 



2 
1 

+ 2 



5> G (U, z) ~6 T (XJ, z + Az) 



(4) 



where tilde denotes the 2D Fourier transform. While it has been assumed that the Ly-a forest and 
the HI 21-cm brightness temperature both traces the same underlying dark matter distribution, 
with possibly different bias parameters, the quasars are assumed to be at a higher redshift and 
hence uncorrelated with either J 7 or T. Using this, we have the expectation value and variance of 
the estimator to be 

(E(\J,Az))=P c (XJ,Az) (5) 

and 



((AE) 2 ) = ip c 2 (U, Az) + \ [P T (U, 0) + N T ] 



x 



i J d 2 U'P T (U',0) + P T (U,0) + N T 



(6) 



where iV T and Njr are the respective noise power spectra for T and J 7 , it being assumed that the 
two noises are uncorrelated. The quasars, with angular number density n, have been assumed to 
be randomly distributed and their clustering has been ignored. We assume that the variance a 2 TN 
of the pixel noise contribution to bj is the same across all the quasar spectra whereby we have 
Njr = a 2 r N /n for its noise power spectrum. The integral in eq. (6) can be simplified using eq. (3) 
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to calculate Pjr(U', 0) whereby 

d 3 k 



J d 2 U' pav, o) = / ^ c£[i + P^ 2 ] 2 P(k) = *h (V) 

where k = (27rU/r, fcy) and o 2 TL is the variance of the fluctuations in the smoothed bj arising 
from the large scale matter fluctuations and peculiar velocities. We have the total variance of 5? as 
a 2 jr = a 2 r]V + a TL wnere by the variance of the cross-correlation estimator is 



((AE) 2 ) = \ {P C 2 (U, Az) + [P T (U, 0) + N T ] 



Pf(U,0) + "" 



n 



(8) 



The cross-correlation signal being statistically isotropic on the sky, we may combine estimates 
of the power spectrum over different directions of U to reduce the uncertainty (or variance) in 
the estimated cross-correlation signal. Binning in U and combining estimates at different redshift 
values within the observational bandwidth lead to a further reduction of the uncertainty. Finally, 
incorporating the possibility of observations in several independent fields of view, we use N E to 
denote the total number of independent estimates that are combined. The uncertainty or noise in the 
resulting combined estimate of P C (U, Az) is a 2 = ((AE) 2 ) / N E . It is convenient to express our 
results in terms of the angular multipole £ = 2nU. We then have N E — (£+ |) Al (BjAv) f N F 
where At is the width of the t bin, B the frequency bandwidth of the 21 -cm observation, Av the 
frequency interval beyond which we have an independent estimate of the signal, / the fraction of 
the sky covered by a single field of view and iVV the number of independent fields of view that are 
observed. 
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Figure 2. The decorrelation of P c (£, Az) with increasing Az for the representative I values shown in the figure. 
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Figure 3. The HI 21-cm angular power spectrum for z = 2.2 with 3cr error bars 



3 DETECTABILITY. 

We next estimate the survey parameters that will be required to detect the cross-correlation signal. 
It is, in principle, possible to vary the parameters of both the redshifted 21-cm survey and the Ly-a 
survey. To keep the analysis simple we restrict our attention to a situation where the parameters of 
the redshifted 21-cm survey are fixed, and vary the parameters of only the Ly-a survey. 

The quasar distribution is known to peak between z = 2 and 3. For any particular quasar, 
it is possible to reliably estimate 5jr in a small redshift range close to the quasar's redshift. The 
region very close to the quasar is excluded due to the quasar's Stromgren sphere, and large redshift 
separations are excluded to avoid Ly-/3 contamination. Based on this we have only considered 
quasars in the z range 2.2 — 3.0 for our estimates, and we have chosen a region centered at z = 2.2 
for our estimates. 

The predicted HI- T cross correlation angular power spectrum is shown (Figures 1 and 2) as- 
suming cosmological parameters from WMAP 5 results (Komatsu et al., 2009). The i dependence 
closely follows that of the HI angular power spectrum (Figure 3). The signals at two different red- 



8 Guha Sarkar, Bharadwaj, Choudhury & Datta 

shifts z and z + Az, we find, decorrelate rapidly with increasing Az, the decline being faster at 
larger £ values. 

The currently functioning GMRT (Swarup et al., 1991) can, in principle be used to probe 
the redshifted HI 21 -cm signal all the way from z ~ to z ~ 8 (Bharadwaj & Ali, 2005). 
The GMRT, at present, has neither the exact frequency band nor the desired sensitivity for the 
proposed observation. In principle it would not be very difficult, in future, to cover the required 
frequency and increase the number of antennas to increase the sensitivity. For the present analysis 
we consider a hypothetical array, possibly an extended version of the GMRT or some other future 
radio telescope with 60 antennas similar to the GMRT, distributed randomly over a 1 km x 1 
km square. Note that this is roughly 4 times the number of antennas currently available in the 
GMRT central square, and henceforth we refer to this as the extended GMRT (EGMRT). Each 
antenna is 45 m in diameter, with a field of view 1.3° (FWHM), total system temperature 100 K 
and antenna gain 0.33 K/ Jy. We assume that the observations are carried out over a frequency band 
of 32 MHz centered at 430 MHz using channels of width 62.5 kHz each. The frequency separation 
Au over which the 21-cm signal remains correlated roughly scales as Av = 1 MHz (£/100)~ ar 
(Bharadwaj & Pandey, 2003). We assume that the signal is averaged over frequency bins of this 
width to increase the SNR. Considering 1, 000 hrs of observation in a single field of view, a 3 a 
(or better) measurement of the HI 21-cm power spectrum will be possible at £ < 4000 (Figure 
3). Note that the error is dominated by the system noise, the cosmic variance being considerably 
smaller. This justifies why we have considered observations in a single field of view instead of 
distributing the 1000 hr observation over different fields. 

Considering next the Ly-a forest surveys, we note that these typically cover a much larger 
angular region and redshift interval compared to the 21-cm observation that we have considered. 
For example, the SDSS Data Release 3 (Schneider et al., 2005), whose data is currently available, 
covers 4, 188 deg 2 of the sky. The number density of quasars in the redshift range z = 2.2 — 3.0 
is n = 1.0 deg~ 2 for this survey. The cross -correlation is restricted to the angular extent of the 
21-cm observation and the redshift interval Az = 0.24 centered at z = 2.2 which corresponds to a 
bandwidth of 32 MHz centered at 430 MHz. The channel width 62.5 kHz of the 21-cm observations 
corresponds to Az = 4.7 x 10~ 4 or equivalently v\\ = 44km/s. 

The analysis of fluctuations in the Lyman-a forest (D'Odorico et al., 2006; Coppolani et al., 
2006) show that the variance has a value ajr L ~ 0.02 for 5jr smoothed over ~ 50km/s along 
the line of sight. This smoothing is comparable to the channel width of the 21-cm observations, 
and for simplicity we assume that Ly-a pixel length is exactly the same as the 21-cm channel 
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width. Note that this is smaller than the typical Az value where the signal decorrelates (Figure 
2). For the pixel noise contribution, we assume that S/N = 5 for every pixel of the spectra 
used to estimate the cross-correlation. This gives <j 2 tn = 0.04, whereby ajr = 0.06 for pixels 
of length 44km/s or 62.5 kHz. As noted earlier, it is advantageous to average the signal over an 
interval Av = 1 MHz (£/100)~ a7 > 62.5 kHz before correlating. The value of will come down 
due to this averaging. Assuming that the pixel noise in different pixels is uncorrelated, we have 
a TN = 0-04 (62.5 kHz/ Az/). Analysis of the line of sight correlation function of 5jr indicate that 
we may expect ajr L to scale faster than (Az/) -1 . For the purpose of this paper we assume that both 
ajr L and ajr N have the same scaling whereby ajr = 0.06 (62.5 kHz/Az/). which we use in eq. (8) 
for our noise estimates. The error introduced by the last assumption, will at worst, cause the noise 
for the cross-correlation signal to be over-estimated. 

We present noise estimates (Figure 1) considering quasar angular number densities n — 1, 4, 16 
and 64 deg -2 . While our intention is primarily to estimate the quasar number density that will be 
required to detect the cross-correlation signal, we note that the n values chosen are viable with 
existing or future surveys. The currently available SDSS (Schneider et al., 2005) has n ~ ldeg -2 
and the upcoming BOSS 1 (McDonald et al., 2005) is expected to have n ~ 16deg -2 , while the 
proposed future BIGBOSS (Schlegel et al., 2009) is anticipated to reach n > 64deg~ 2 . We find 
that a 3<T and 5a detection will be possible at £ < 2000 for n = 1 and 4 deg~ 2 respectively. A 5cr 
(or better) detection will be possible over the entire I range for n = 16 and 64 deg -2 . There is a 
further reduction of noise by a factor N F 1 ^ 2 if the same observation is repeated in multiple fields 
of view. 

Unlike the 5jr auto-correlation power spectrum which is Poisson noise dominated, the cross- 
correlation signal itself is not affected by the discrete quasar sampling. However its variance is 
very sensitive to this, and a dense quasar sampling will allow the cross-correlation to be measured 
at a high level of precision. 

The discussion so far has completely bypassed several observational difficulties which pose 
a severe challenge. Considering first the Ly-a forest, errors in continuum fitting and subtraction 
would result in an additive error in the estimated 5jr which will inhibit recovery of the underlying 
power spectrum at large scales. Croft et al. (2002) and McDonald et al. (2006) have studied this 
issue extensively and have proposed several techniques to mitigate the contribution from such 
errors. While an additive error in 5jr could have severe repercussions for the large-scale power 
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spectrum estimated from the auto-correlation of 5jr (Kim et al., 2004), we do not expect these 
errors to be correlated with the 21 -cm data. An additive error in 5jr will manifest itself as an 
extra contribution to the noise for the cross-correlation power spectrum (eq. 8) which, in turn may 
degrade the SNR and hence affect the detectability of the cross-correlation signal. 

The redshifted 21 -cm signal is buried under foregrounds which are several orders of magni- 
tude larger (Shaver et al., 1999; Di Matteo et al, 2002; Santos et al., 2005; Wang & Hu, 2006; Ali 
et al., 2008; Bernardi et al., 2009; Pen et al., 2009; Ghosh et al., 2010). Extragalactic point sources 
and the diffuse synchrotron radiation from our own Galaxy are the two most dominant foreground 
components. The free-free emissions from our Galaxy and external galaxies make much smaller 
contributions (Shaver et al., 1999), though each of these is individually larger than the HI signal. 
Several different techniques have been proposed for separating the 21 -cm signal from the fore- 
grounds. All of these depend on the fact that the foregrounds are expected to have a continuum 
frequency spectrum, and their contribution at two different frequencies separated by Az/ is ex- 
pected to be correlated well beyond Az/ ~ 5 MHz . The 21 -cm signal, however, is predicted to 
decorrelate within Az/ ~ 1 MHz for angular scales of our interest (Bharadwaj & Sethi, 2001). 
A possible technique for foreground removal is to subtract out any smooth frequency dependent 
component either from the image cube (Jelic et al., 2008; Bowman et al., 2009; Liu et al., 2009) 
or from the gridded visibilities (Liu et al., 2009). Another possible approach is to first estimate the 
multi-frequency angular power spectrum of the radio-interferometric data and then subtract out 
any component that remains correlated over large frequency separations (Ali et al., 2008; Ghosh 
et al, 2010). 

The foregrounds of the redshifted 21 -cm signal are expected to be uncorrelated with the Ly-a 
forest Sjr and also any errors arising in it from continuum subtraction. We do not expect the fore- 
grounds to contribute to the estimated cross-correlation signal, and we anticipate that the problem 
of foreground removal will be considerably less severe as compared to the auto-correlation. Errors 
in foreground subtraction will manifest as an extra source of noise for the cross-correlation signal. 
The fact that the 21 -cm S T and the Ly-a 5jr at two different redshifts separated by Az decorrelate 
rapidly as Az is increased (Figure 2) should help in identifying any foreground contamination. 

Errors in calibrating the radio observations is another possible source of uncertainty in the 
21-cm signal. This will lead to errors in the overall amplitude of the cross-correlation signal, or 
equivalently contribute to uncertainties in estimates of the quantity CjCt defined in Section 2. 

In conclusion, we propose the 21-cm and Ly-a forest cross-correlation signal as a tool to 
measure the large-scale matter distribution. The problem of foreground removal is expected to be 



Cross-correlation of the HI 21 -cm Signal and Lyman-a Forest: A Probe Of Cosmology 1 1 

considerable less severe for the cross-correlation than for the 21-cm auto correlation signal. The 
cross-correlation signal will probe a variety of issues like the astrophysics of the diffuse IGM, the 
growth of large-scale structures and the expansion history of the Universe. 
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